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Recent advances in quantum technology have led to the development and manufacturing of ex¬ 
perimental programmable quantum annealing optimizers that contain hundreds of quantum bits. 
These optimizers, named ‘D-Wave’ chips, promise to solve practical optimization problems poten¬ 
tially faster than conventional ‘classical’ computers. Attempts to quantify the quantum nature of 
these chips have been met with both excitement and skepticism but have also brought up numerous 
fundamental questions pertaining to the distinguishability of quantum annealers from their classical 
thermal counterparts. Here, we propose a general method aimed at answering these, and apply it 
to experimentally study the D-Wave chip. Inspired by spin-glass theory, we generate optimization 
problems with a wide spectrum of ‘classical hardness’, which we also define. By investigating the 
chip’s response to classical hardness, we surprisingly find that the chip’s performance scales unfavor¬ 
ably as compared to several analogous classical algorithms. We detect, quantify and discuss purely 
classical effects that possibly mask the quantum behavior of the chip. 


I. INTRODUCTION 

Interest in quantum computing originates in the poten¬ 
tial of quantum computers to solve certain computational 
problems much faster than is possible classically, due to 
the unique properties of Quantum Mechanics [H |2] . The 
implications of having at our disposal reliable quantum 
computing devices are obviously tremendous. The actual 
implementation of quantum computing devices is how¬ 
ever hindered by many challenging difficulties, the most 
prominent being the control or removal of quantum deco¬ 
herence [3]. In the past few years, quantum technology 
has matured to the point where limited, task-specific, 
non-universal quantum devices such as quantum com¬ 
munication systems, quantum random number genera¬ 
tors and quantum simulators, are being built, possessing 
capabilities that exceed those of corresponding classical 
computers. 

Recently, a programmable quantum annealing ma¬ 
chine, known as the D-Wave chip [3], has been built 
whose goal is to minimize the cost functions of classically- 
hard optimization problems presumably by adiabatically 
quenching quantum fluctuations. If found useful, the 
chip could be regarded as a prototype for general-purpose 
quantum optimizers, due to the broad range of hard the¬ 
oretical and practical problems that may be encoded on 
it. 

The capabilities, performance and underlying physi¬ 
cal mechanism driving the D-Wave chip have generated 
fair amounts of curiosity, interest, debate and controversy 
within the Quantum Computing community and beyond, 
as to the true nature of the device and its potential to 
exhibit clear “quantum signatures”. While some studies 
have concluded that the behavior of the chip is consistent 
with quantum open-system Lindbladian dynamics [5] or 
indirectly observed entanglement |5], other studies con¬ 


testing these [3 [S] pointed to the existence of simple, 
purely classical, models capable of exhibiting the main 
characteristics of the chip. 

Nonetheless, the debate around the quantum nature 
of the chip has raised several fundamental questions per¬ 
taining to the manner in which quantum devices should 
be characterized in the absence of clear practical “signa¬ 
tures” such as (quantum) speedups [U [10] . Since quan¬ 
tum annealers are meant to utilize an altogether different 
mechanism for solving optimization problems than tra¬ 
ditional classical devices, methods for quantifying this 
difference are expected to serve as important theoretical 
tools while also having vast practical implications. 

Here, we propose a method that partly solves the above 
question by providing a technique to characterize, or 
measure, the “classicality” of quantum annealers. This is 
done by studying the algorithmic performance of quan¬ 
tum annealers on sets of optimization problems possess¬ 
ing quantifiable, varying degrees of “thermal” or “classi¬ 
cal” hardness, which we also define for this purpose. To 
illustrate the potential of the proposed technique, we ap¬ 
ply it to the experimental quantum annealing optimizer, 
the D-Wave Two (DW2) chip. 

We observe several distinctive phenomena that reveal 
a strong correlation between the performance of the chip 
and classical hardness: i) The D-Wave chip’s typical 
time-to-solution (tg) as a function of classical hardness 
scales differently, in fact worse, than that of thermal 
classical algorithms. Specifically, we find that the chip 
does very poorly on problem instances exhibiting a phe¬ 
nomenon known as “temperature chaos”. ii) Fluctua¬ 
tions in success probability between programming cycles 
become larger with increasing hardness, pointing to the 
fact that encoding errors become more pronounced and 
destructive with increasing hardness, iii) The success 
probabilities obtained from harder instances are affected 
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more than easy instances by changes in the duration of 
the anneals. 

Analyzing the above findings, we identify two major 
probable causes for the chip’s observed “sub-classical” 
performance, namely i) that its temperature may not be 
low enough, and ii) that encoding errors become more 
pronounced with increasing hardness. We further offer 
experiments and simulations designed to detect and sub¬ 
sequently rectify these so as to enhance the chip’s capa¬ 
bilities. 


II. CLASSICAL HARDNESS, TEMPERATURE 
CHAOS AND PARALLEL TEMPERING 

In order to study the manner in which the performance 
of quantum annealers correlates with ‘classical hardness’, 
it is important to first accurately establish the meaning 
of classical hardness. For that purpose, we refer to spin- 
glass theory which deals with spin glasses — disor¬ 
dered, frustrated spin systems that may be viewed as 
prototypical classically-hard (also called NP-hard) opti¬ 
mization problems, that are so challenging that special¬ 
ized hardware has been built to simulate them p^l4j . 

Currently, the (classical) method of choice to study 
general spin-glass problems is Parallel Tempering (PT, 
also known as ‘exchange Monte Carlo ’) [min]. PT is a 
refinement of the celebrated yet somewhat outdated Sim¬ 
ulated Annealing algorithm that finds optimal as¬ 
signments (i.e., the ground-state configurations) of given 
discrete-variable cost functions. It is therefore only natu¬ 
ral to make use of the performance of PT to characterize 
and quantify classical hardness. 

In PT simulations, one considers Nt copies of an N- 
spin system at temperatures Ti < T 2 < ... < Tatt i where 
each copy undergoes Metropolis spin-flip updates inde¬ 
pendently of other copies. In addition, copies with neigh¬ 
boring temperatures regularly attempt to swap their tem¬ 
peratures with probabilities that satisfy detailed bal¬ 
ance [18j . In this way, each copy performs a tempera¬ 
ture random-walk (see inset of Fig[^. At high temper¬ 
atures, free-energy barriers are easily overcome, allowing 
for a global exploration of configuration space. At lower 
temperatures on the other hand, the local minima are 
explored in more detail. A ‘healthy’ PT simulation re¬ 
quires an unimpeded temperature flow: The total length 
of the simulation should be longer than the temperature 
‘mixing time ’r mus]. The time r may be thought of as 
the average time it takes each copy to fully traverse the 
temperature mesh, indicating equilibration of the simu¬ 
lation. Therefore, instances with large t are harder to 
equilibrate, which motivates our definition of the mixing 
time T as the classical hardness of a given instance. 

Despite the popularity of PT, it has also become appar¬ 
ent that not all the spin-glass problems can be efficiently 
solved by the algorithm [20l [2T]. The reason is a phe¬ 
nomenon that has become known as Temperature Chaos 
(TC). TC [HHSS] consists of a sequence of first-order 



10 ^ 10 '* 10 ^ 10 ® 
T (Metropolis sweeps) 


FIG. 1. Probability distribution of mixing times r over 
random J = ±1 ‘Chimera’ instances as extracted from 
the PT random-walk on the temperature grid. The 

solid line is a linear fit to the tail of the distribution, implying 
the existence of rare instances with very long mixing times 
(note that a tail is not strictly integrable, hence this spe¬ 
cific power law decay should only be regarded as a finite-size 
approximation). Inset: Example of a temperature random 
walk for an instance with r « 1.8 x 10®. Considering one of 
Nt = 30 copies of the system, at any given Monte Carlo time 
t, the copy’s temperature is Tj. In this example, the replica 
has visited each temperature several times, pointing to the 
fact that the simulation time is longer than the mixing time 

T. 

phase transitions that a given spin-glass instance experi¬ 
ences upon lowering its temperature, whereby the domi¬ 
nant configurations minimizing the free energy above the 
critical temperatures are vastly different than those be¬ 
low them (Fig. [^depicts such a phase that is ‘rounded’ 
due to the finite size of the system). A given instance may 
experience zero, one or more transitions at random tem¬ 
peratures, making the study of TC excruciatingly diffi¬ 
cult imiMHSS]. Such TC transitions hinder the PT tem¬ 
perature flow, significantly prolonging the mixing time 
T. In practice [2T], it is found that for small systems the 
large majority of the instances do not suffer any TC tran¬ 
sitions and are ‘easy’ (i.e., they are characterized by short 
mixing times). However, for a minor fraction of them, r 
turns out to be inordinately large. Moreover, the larger 
the system is, the larger the fraction of long-r samples 
becomes. In the large N limit, these are the short-r sam¬ 
ples that become exponentially rare in N [111132]. This 
provides further motivation for studying TC instances of 
optimization problems on moderately small experimental 
devices (even if they are rare). 


HI. TEMPERATURE CHAOS AND QUANTUM 
ANNEALERS 

With the advent of quantum annealers, which presum¬ 
ably offer non-thermal mechanisms for finding ground 
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FIG. 2. Top: Energy above the ground-state energy 
as a function of temperature for two randomly chosen 
instances. The mixing times of the two instances are r ~ 10^ 
(“easy”) and r « 10® (“hard”). Unlike the easy sample, the 
hard sample exhibits “temperature chaos”: Upon lowering 
the temperature, the energy decreases at first in a gradnal 
manner, however at T « 0.2 there is a sudden drop indicat¬ 
ing that a different set of minimizing configurations has been 
visited. Inset: Main panel’s data vs. exp(—A/T) (where 
A = 2 is the excitation gap). A linear behavior is expected 
if the system can be described as a gas of non-interacting ex¬ 
citations over a local energy minimum. For the easy sample, 
this local minimum is a ground state. On the other hand, 
for instances displaying TC above their chaos temperature, 
the local-minimum energy is higher than the ground-state’s. 
Bottom: r-dependence of the median systematic er¬ 
ror (for each r-generation) of a T —> 0 extrapolation 
of the total energy. For each instance, we extrapolated the 
T = 0.2, 0.3 data linearly in exp(—A/T) and compared this 
extrapolation with the actual ground-state energy. 


states, it has become only natural to ask whether quan¬ 
tum annealers can be used to solve ‘TC-ridden’ optimiza¬ 
tion problems faster than classical techniques such as PT. 
In this context, the question of how the performance of 
quantum annealers depends on the ‘classical hardness’ 
becomes of fundamental interest: If indeed quantum an¬ 
nealers exploit quantum phenomena such as tunneling to 
traverse energy barriers, one may hope that they will not 
be as sensitive to the thermal hardness (as defined above) 
of the optimization problems they solve. As we shall see 
next, having a practical definition for classical hardness 
allows us to address the above questions directly. 

To illustrate this, in what follows we apply the ideas 
introduced above to the DW2 quantum annealing opti¬ 
mizer in order to infer its degree of ‘classicality’. We ac¬ 
complish this by first generating an ensemble of instances 
that are directly embeddable on the DW2 ‘Chimera’ ar¬ 
chitecture [the reader is referred to Appendix for a 
detailed description of the Chimera lattice and the D- 
wave chip and its properties]. The chip on which we 
perform our study is an array of 512 superconducting 
flux qubits of which only 476 are functional, operating 


at a temperature of ~ 15mK. The DW2 chip is designed 
to solve a very specific type of problems, namely, Ising- 
type optimization problems, by adiabatically transition¬ 
ing the system Hamiltonian from an initial transverse- 
field Hamiltonian to a final classical programmable cost 
function of a typical spin glass. The latter is given by 
the Ising Hamiltonian: 

.^Ising — ^ ^ JijSiSj T ^ ] hiSi . (1) 

(U> » 

The Ising spins, Si = ±1 are the variables to be optimized 
over, and the sets {Jij} and {hi\ are programmable pa¬ 
rameters of the cost function. Here, {ij) denotes a sum 
over all the active edges of the Chimera graph. For 
simplicity, we conduct our study on randomly-generated 
problem instances with hi = 0 and random, equiprobable 
Jij = ± J couplings (in our energy units J = 1). 

Initially, it is not clear whether the task of find¬ 
ing thermally-hard instances on the Chimera is feasible. 
While on the one hand TC has been observed in spin- 
glasses on the square lattice 133 , which has the same spa¬ 
tial dimension, D = 2, as the Chimera [31], it has also 
been found that typical Chimera-embeddable instances 
are easy to solve puniisi]. As discussed above, sys¬ 
tem size plays a significant role in this context, as an 
N ~ 512-spin Chimera may simply be too small to have 
instances exhibiting TCQ Taking a brute-force approach 
to resolve this issue, we generated ^ 80,000 random 
problem instances (each characterized by a different set 
of {Jij}), analyzing each one by running them on a state- 
of-the-art PT algorithm until equilibration was reached 
(Appendix E- This allowed for the calculation of the 
instances’ classical hardness, namely their temperature 
mixing times r (for more details, see Appendix be¬ 
low) . The resulting distribution of t over the instances is 
shown in Fig. While most instances equilibrate rather 
quickly (after some 10"^ MC steps), we find that the distri¬ 
bution has a ‘tail’ of hard samples with r > 10® revealing 
that hard instances, although rare, do exist (we estimate 
that 2 samples in lO’^ have r > 10^). 

To study the DW2 chip, we grouped together instances 
with similar classical hardness, i.e., similar mixing times, 
10^ < r < 3 • 10^ for A: = 3,4, 5, 6 and 7. For each such 
‘generation’ of r, we randomly picked 100 representative 
instances for running on the chip (only 14 instances with 
k = 7 were found). As a convergence test of PT on 
the selected instances, we verified that the ground-state 
energies reached by PT are the true ones by means of an 
exact solver. 

At the purely classical level, we found, as antici¬ 
pated that classically hard instances differ from easy 
instances from a thermodynamic point of view as well. 


^ For instance, on the square lattice one needs to reach N ~ 512^ ^ 
2.5 X 10® spins for TC to be the rule rather than the excep¬ 
tion 1371 . 
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Specifically, large r instances were found to exhibit sharp 
changes in the average energy at random critical tem¬ 
peratures, consistently with the occurrence of TC (see 
Fig.[|). For such instances, the true ground states are 
present during the simulations only below the TC criti¬ 
cal temperatures. As the inset shows, the larger r is, the 
lower these critical temperatures typically are. Further¬ 
more, classically-hard instances were found to differ from 
easier ones in terms of their energy landscape: While for 
easy instances minimally-excited states typically reside 
only a few spin flips away from ground state configura¬ 
tions, for classically hard instances, this is not the case 
(see Appendix . 


IV. CLASSICALITY OF THE “D-WAVE TWO” 
CHIP 


Having sorted and analyzed the randomly-generated 
instances, we turned to experimentally test the perfor¬ 
mance of the D-Wave chip on these (for details see Ap¬ 
pendix 1^ and C 4). Our experiments consisted of pro¬ 
gramming the chip to solve each of the 414 instances over 
a dense mesh of annealing times in the available range of 
ta S [20^5, 20tos]. The number of attempts, or anneals, 
that each instance was run for each choice of ranged 
between 10^ and 10®. By calculating the success prob¬ 
ability of the annealer for each instance and annealing 
time, a typical time-to-solution 4 was obtained for each 
hardness-group, or ‘r-generation’ (see Appendix [A| . In¬ 
terestingly, we found that for easy samples (r = 10®) the 
success probability depends only marginally on taj point¬ 
ing to the annealer reaching its asymptotic performance 
on these. As instances become harder, the sensitivity of 
success probability to ta increases significantly. Nonethe¬ 
less, for all hardness groups, the typical tg is found to 
be shortest at the minimally-allowed annealing time of 
fa = 20/rs (see Appendix C 4 for a more detailed discus¬ 
sion). 

The main results of our investigation are summarized 
in Fig. depicting the typical time to solution ts of 
the DW2 chip (averaged over instances of same hardness 
groups, see Appendix]^ as a function of classical hard¬ 
ness, or ‘r-generation’. As is clear from the hgure, the 
performance of the chip was found to correlate strongly 
with the ‘thermal hardness’ parameter, indicating the 
significant role thermal hardness plays in the annealing 
process. Interestingly, the chip’s response was found to 
be affected by thermal hardness even more than PT, i.e., 
more strongly than the classical thermal response: While 
for PT the time-to-solution scales as ts ~ r, the scaling 
of the D-Wave chip was found to scale as ~ 7 -“dw 2 ^ 
with Q!dw 2 ~ 1.73. This scaling is rather surprising given 
that for quantum annealers to perform better than clas¬ 
sical ones, one would expect these to be less susceptible 
to thermal hardness, not more. Nonetheless, it is clear 
that the notion of classical hardness is very relevant to 
the D-wave chip. 



FIG. 3. Dependence of typical time to solution ta of 
the examined optimization algorithms on mixing time 
T, the classical hardness parameter. Classical thermal 
algorithms scale linearly with r. Here, PTgq denotes time to 
equilibration which by definition scales linearly with r, and 
PTheur denotes PT functioning as a heuristic solver in which 
case the time to solution is the number of Monte-Carlo steps 
to first encounter of a minimizing configuration. The ta of 
the classical non-thermal HFS algorithm (measured in ps) 
scales as with ohfs ~ 0.3. The ta for the DW2 chip 

(measured in ps) scales as t°‘ow 2 q:dw 2 ~ 1.73 (we note 
the missing error bars on the 10^ DW2 data point, which is a 
result of insufficient statistics). 

To complete the picture, we have also tested our 
instances on the Hamze-de Freitas-Selby (HFS) algo¬ 
rithm [39l |40j , which is the fastest classical algorithm to 
date for Chimera-type instances. Even though the HFS 
algorithm is a ‘non-thermal’ algorithm (i.e., it does not 
make use of a temperature parameter), we have found 
the concept of classical hardness to be very relevant here 
as well. For the HFS algorithm, we find a scaling of 
^hfs ^ .^aHFs with q;hfs ~ 0.3, implying that the algo¬ 
rithm is ^nificantly less susceptible to thermal hardness 
than PTIj 


V. ANALYSIS OF FINDINGS 

The above somewhat less-than-favorable performance 
of the experimental D-wave chip on thermally-hard prob¬ 
lems is not necessarily a manifestation of the intrinsic 
limitations of quantum annealing, i.e., it does not neces¬ 
sarily imply that the ‘quantum landscape’ of the tested 
problems is harder to traverse than the classical one (al¬ 
though this may sometimes be the case mma) . A care¬ 
ful analysis of the results suggests in fact at least two 
different more probable ‘classical’ causes for the chip’s 
performance. 


^ It is worth pointing out that the typical runtime for the HFS 
algorithm on the hardest, r = 10^, group problems was found 
to be ~ 0.5s on an Intel Xeon CPU E5462 @ 2.80GHz, which 
to our knowledge makes these the hardest known Chimera-type 
instances to date. 
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First, as already discussed above and succinctly cap¬ 
tured in Fig. temperature is expected to play a key 
role in DW2 success on instances exhibiting TC. This is 
because for these, the ground state conhgurations min¬ 
imize the free energy only below the lowest critical TC 
temperature. Even though the working temperature of 
the DW2 chip is rather low, namely ~ 15mK, the crucial 
figure of merit is the ratio of coupling to temperature 
Tj J [recall Eq. 01- Although the nominal value for the 
chip isT! J « 0.1, any inhomogeneity of the temperature 
across the chip may render the ratio higher [33] , possibly 
driving it above typical TC critical temperaturesj^ 

Another possible cause for the above scaling may be 
due to the analog nature of the chip. The program¬ 
ming of the coupling parameters Jy and magnetic helds 
hi is prone to statistical and systematic errors (also re¬ 
ferred to as intrinsic control errors, or ICE). The cou¬ 
plings actually encoded in DW2 are = ±J -\- R, 
where R ^ A/’(0, 5J) is a random error {5J « 0.05 J, 
according to the chip’s manufacturer). Unfortunately, 
even tiny changes in coupling values are known to po¬ 
tentially change the ground-state conhgurations of spin 
glasses in a dramatic manner [3SJ|35Mj. We refer to this 
effect as ‘coupling chaos’ (or J-chaos, for short). For an 
Wbit system, J-chaos seems to become signihcant for 
SJcrit ~ |J|/A^“. Empirically (35] ST] a « l/D, D be¬ 
ing the spatial dimension of the system. Note, however, 
that these estimates refer only to typical instances and 
small N whereas the assessment of the effects of J-chaos 
on thermally-hard instances remains an important open 
problem for classical spin glasses. 

Here, we empirically quantify the effects of J-chaos 
by taking advantage of the many programming cycles 
and gauge choices each instance has been annealed 
with (typically between 200 and 2000). Calculating a 
success probability p for each cycle, we compute the 
probability distribution of p over different cycles for 
each instance [211 We hnd that while for some 

instances p is essentially insensitive to programming 
errors, for other instances (even within the same thermal 
hardness group), p varies significantly, spanning several 
orders of magnitude. This is illustrated in Fig. [^ 
which presents some results based on a straightforward 
percentile analysis of these distributions. For instance 
in the figure, the 80th percentile probability is 
Iq=o.8{p) = 0.669(3), whereas the probability at the 
90th percentile is Iq=o. 9 {p) = 0.698(4). Hence the ratio 
i? 8,9 = Iq=o,8(p)/Iq=o. 9 {p) is close to One. Conversely, 
for instance #35, the values are Iq=o.8{p) — 0.008(1), 


® We refer to the physical temperature of the chip. However, non- 
equilibrium systems (e.g., supercooled liquids or glasses) can be 
characterized by two temperatures 04]: On the one hand, the 
physical temperature T which rules fast degrees of freedom that 
equilibrate. On the other hand, the ‘effective’ temperature refers 
to slow degrees of freedom that remain out of equilibrium. We 
are currently investigating whether or not the DW2 chip can 
analogously be characterized by two such temperatures. 



10 '^ 10 "'* 10 '^ 10 '^ 10 ’' 10 ° 
Success probability in a single cycle 


FIG. 4. Empirical evidence for the absence/presence 
of ‘J-chaos’. Probability density of success probability of 
a single cycle, p, over different programming cycles. The 
probability densities are plotted here for two easy (r = 10^) 
instances (here, ta, = 20/rs and the number of anneals per 
programming cycle is A = 49500, see Appendix]^ Instance 
#1 (662 cycles) is typical in this r-generation with success 
probability p ~ 1 in the majority of the programming cy¬ 
cles. On the other hand, instance #35 (1624 cycles) suffers 
from strong J-chaos: Even though the probability of finding 
p ~ 0.1 in some of the programming cycles is not negligible, 
most cycles are significantly less successful, e.g., the median p 
is #= 0.5 = 6.7(5) X 10~'*. Inset: Typical ratio of the 80th per¬ 
centile probability to the 90th percentile probability, namely 
As ,9 = #= 0 . 8 / 75 = 0.9 (see text) as a function of r (for r > 10° 
and 10 ^, As ,9 was not computed due to extremely low success 
probabilities). Smaller ratios indicate larger fluctuations in 
success probabilities. 

Iq=o. 9 {p) = 0.07(2) and the ratio is # 3,9 = 0.12, i.e., the 
success probability drops by an order of magnitude. The 
inset of Fig. shows the typical ratio # 3,9 as a function 
of classical hardness, demonstrating the strong corre¬ 
lation between thermal hardness and the devastating 
effects of J-chaos caused by ICE, namely that the larger 
T is, the more probable it is to find instances for which 
p varies wildly between programming cycles. 


VI. DISCUSSION 

We have devised a method for quantifying the ‘clas- 
sicality’ of quantum annealers by studying their perfor¬ 
mance on sets of instances characterized by different de¬ 
grees of thermal hardness, which we have defined for that 
purpose as the mixing (or equilibration) time r of classi¬ 
cal thermal algorithms (namely, PT) on these. We find 
that the 2D-hke Chimera architecture used in the D- 
Wave chips does give rise to thermally very-hard, albeit 
rare, instances. Specifically, we have found samples that 
exhibit temperature chaos, and as a such have very long 
mixing times, i.e., they are classically exceptionally hard 
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to solve. 

Applying our method to an experimental quantum an¬ 
nealing optimizer, the DW2 chip, we have found that 
its performance is more susceptible to changes in ther¬ 
mal hardness than classical algorithms. This is in con¬ 
trast with the performance of the best-known state-of- 
the-art classical solver on Chimera graphs, the ‘non- 
thermal’ HFS algorithm, which scales (unsurprisingly) 
better. Our results are not meant to suggest that the 
DW2 chip is not a quantum annealer, but rather that 
its quantum properties may be significantly masked by 
much-undesired classical effects. 

We have identihed and quantified two probable causes 
for the observed behavior: A possibly too high tempera¬ 
ture, or more probably, J-chaos, the random errors stem¬ 
ming from the digital-to-analog conversion in the pro¬ 
gramming of the coupling parameters. One may hope 
that the scaling of current DW2 chips would significantly 
improve if one or both of the above issues are resolved. 
Clearly, lowering the temperature of the chip and/or re¬ 
ducing the error involved in the programming of its pa¬ 
rameters are both technologically very ambitious goals, 
in which case error correcting techniques may prove very 
useful [IH]. We believe that quantum Monte Carlo sim¬ 
ulations of the device will be instrumental in the under¬ 
standing of the roles that temperature and magnitude 
of programming errors play in the performance of the 
chip (and of its classical counterparts). In turn, this 
will help sharpening the most pressing technological chal¬ 
lenges facing the fabrication of these and other future 
quantum optimizing devices, paving the way to obtain¬ 
ing a long-awaited insights as to the difference between 
quantum and classical hardness in the context of opti¬ 
mization. We are currently pursuing these approaches. 
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Appendix A: Methods 

Computation of the mixing time r.— Because 
we follow Ref. [20], we just briefly summarize here the 
main steps of the procedure. Considering one of the Nt 
system copies in the PT simulation, let us denote the 
temperature of copy i at Monte Carlo time t by 
where 1 < z* < Nt (see inset of Pig. [^. At equi¬ 
librium, the probability distribution for it is uniform 
(namely, 1/Nt) hence the exact expectation value of it 
is {it) = {Nt + l)/2. Prom the general theory of Markov 
Chain Monte Carlo [l8|, it follows that the equilibrium 
time-correlation function may be written as a sum of ex¬ 
ponentially decaying terms: 

C'pt(s) = {itit+s) - (Al) 

= ^ {ti> T2> 

n 

The mixing time r is the largest ‘eigen-time’ ri. We 
compute numerically the correlation function C'pt(s) and 
fit it to the decay of two exponential functions (so we 
extract the dominant time scale r and a sub-leading time 
scale). The procedure is described in full in Ref. [2U] . 

D -wave data acquisition and analysis.— 

Data acquisition: 

In what follows we briefly summarize the steps of the 
experimental setup and data acquisition for the anneals 
performed on the 414 randomly-generated instances in 
the various thermal-hardness groups. 

1. The Jij couplings of each of the 414 instances have 
been encoded onto the D-Wave chip using many 
different choices of annealing times in the allowed 
range of 20^s< ta < 20ms. 

2. Por each instance and each choice of t^ the following 
process has been repeated hundreds to thousands 
of times: 

(a) Pirst, a random ‘gauge’ has been chosen for 
the instance. A gauge transformation does 
not change the properties of the optimization 
problem but has some effect on the perfor¬ 
mance of the chip which follows from the im¬ 
perfections of the device that break the gauge 
symmetry. The different gauges are applied by 
transforming the original instance hi —>■ rjihi, 
Jij rjirjjJij, to the original cost function 
Eq. Q . The above gauge transformations cor¬ 
respond to the change Si —>■ rjiSi in configura¬ 
tion spin values. Here, the N gauge parame¬ 
ters r]i = ±1 were chosen randomly. 
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(b) The chip was then programmed with the 
gauge-transformed instance (inevitably 
adding programming bias errors, as men¬ 
tioned in the main text). 

(c) The instance was then solved, or annealed, X 
times within the programming cycle/with the 
chosen gauge. We chose X « Isec/ta, the 
maximally allowed amount. 


(d) After the X anneals were performed, the 
number of successes Y, i.e., the number of 
times a minimizing configuration had been 
found, was recorded. The probability of 
success for the instance, for that particular 
gauge/programming cycle and annealing time 
ta was then estimated as p = Y/X. Note, 
that in cases where the probability of success 
is of the order of l/A, the probability p will 
be a rather noisy estimate (see data analysis, 
below, for a procedure to mitigate this prob¬ 
lem). 


(e) If the prefixed number of cycles for the current 
instance and ta has not been reached, return 
to (2a I and choose a new gauge. 


Data analysis and time-to-solution estimates: 

The analysis of the data acquisition process described 
above proceeded as follows. 

1. For each instance and each annealing time, the to¬ 
tal number of hits Ftot = calculated, 

where i sums over all the gauge/programming cy¬ 
cles. Denoting Atot = ^he total number 

of annealing attempts, the probability of success for 
any particular instance and anneal time was then 
calculated as P = Ttot/Atot- 

2. The above probability was then converted into an 
average time-to-solution ts for that instance and 
ta according to 4 = ta.lP, where the special case 
of P = 0 designates an estimate of an infinite ts, 
where in practice the true probability lies below the 
resolution threshold of l/Atot- 

3. A typical runtime for a hardness group was then 
obtained by taking the median over all minimal ts 
values of all the instances in the group. 


Appendix B: Instance generation and analysis 
1. The Parallel Tempering simulations 

We briefly outline the technical details of the Paral¬ 
lel Tempering (PT) simulations and subsequent analysis 
performed on the randomly generated instances. The 
reader is referred to Ref. [501 ^or further details. 


To run the simulations, we employed multi-spin cod¬ 
ing |49j . a simple yet efficient form of parallel computa¬ 
tion, that allows the simultaneous simulations of a large 
number of problem instances. The name multi-spin fol¬ 
lows from the fact that our dynamic variables are binary. 
Si = ±1, and thus can be coded in a single bit. Hence, 
one may code the i-th spin corresponding to M indepen¬ 
dent instances in a single M-bit computer word. Using 
streaming extensions, M can be as large as M = 256, 
nowadays]^ The advantages of simulating in parallel 256 
instances are obvious, if we want to study a huge number 
of problems in our quest for these rare instances display¬ 
ing Temperature Chaos. In fact, we have simulated a 
total of Ns = 303 x 256 = 77568 problem instances. 

The temperature grid of the PT simulations consisted 
of Nt = 30 temperatures. Temperatures with indices 
i = 13,14,..., Nt were evenly distributed in the range 
0.21 < Ti < Tniax = 1.632, while lower tempera¬ 
tures in the range Tmin = 0.045 < Ti < 0.2 (indices 
j = 1,2,..., 12) have been added in order to detect tem¬ 
perature chaos effects}^ Ergodicity was maintained by 
the temperature-swap part of the PT. 

For each instance, we ran four independent simulations 
(i.e., four replicas) per temperature, where an elementary 
Monte Carlo step consisted of 10 full lattice Metropo¬ 
lis sweeps, followed by a PT temperature swap attempt. 
Since the Chimera lattice is bipartite, Metropolis sweeps 
have been carried out on alternating partitions at each 
step. 

The calculation of the mixing time t was conducted in 
three stages, or rounds (see also Ref. [20]). In the first 
stage, all 303 x 256 instances were simulated for a total of 
10® elementary Monte Carlo steps (i.e., each system was 
simulated for 10^ full-lattice Metropolis sweep). At the 
end of each round the mixing time r was computed (mea¬ 
sured in units of full-lattice Metropolis sweeps). As can 
also be read off Fig. 1 of the main text, the first-round 
of simulations was adequate for the equilibration of most 
instances, namely, for problems of r-generations 10®, 10^ 
and 10®. The second round of simulations was set to 
last 10 times longer, consisting of 10^ elementary Monte 
Carlo steps, or 10® full-lattice Metropolis sweeps per sys¬ 
tem. These longer simulations were reserved only to the 
1024 hardest instances corresponding to r-generations of 
10® or largerj^ The 256 worse-scoring instances of these 


Simulations were run on a standard Intel(R) Xeon(R) processors 
(E5-2690 0 @ 2.90GHz). 

^ In our simulations, the minimal temperature value Tmin = 0.045 
corresponds effectively to zero temperature. This follows from 
the minimal energy gap of our instances being A = 2 together 
with our use of a 64-bit (pseudo) random-number generator [50]. 
Setting Tmin such that it obeys 2®^ xe our Metropo¬ 

lis simulations at Tmin effectively become equivalent to T = 0 
simulations. 

^ The criterion for selecting the hardest instances to qualify for 
subsequent rounds of simulations was done by assigning a figure 
of merit for each instance based on the performance of its 4 x 




were further simulated for 10® elementary Monte Carlo 
steps (or 10® full-lattice Metropolis sweeps per system). 
This simulation lasted 10 days on our fastest CPU [In- 
tel(R) Core(TM) i7-4770K CPU @ 3.50GHz]. From it, 
we obtained 14 extremely hard problem instances, be¬ 
longing to T-generation 10^. 

2. Spin-overlap analysis 

Here, we briefly extend the analysis of the energy land¬ 
scapes of thermally-hard problems, initiated in Fig. 2 of 
the main text. To do so, we make use of the notion of 
‘spin-overlap’, which plays a major role in any spin glass 
investigation (see, e.g.. Ref. [H]). The spin overlap q be¬ 
tween two iV-spin configurations {s“} and {sj} is defined 
as 

q=l-2^, (Bl) 

where A^a,b is the Hamming distance between the two 
configurations, i.e., the number of spins by which the two 
configurations differ. For two identical configurations we 
get Na^b = 0 and < 7=1 while for two configurations 
differing by a global spin flip (i.e., s“ = —s\ for all i) 
we have = N and q = —\. Since the instances 
we study all have hi = 0, their Hamiltonians possess a 
global bit-flip symmetry. For any configuration {s“}, its 
spin-reversed configuration {s(’} = — {s“} has the same 
energy. We have therefore chosen to consider the more 
informative measure of the absolute value of the overlap, 
Iqj. In this case, the maximum meaningful Hamming 
distance between any two configurations is N/2. 

We utilized spin overlaps to investigate the energy 
landscapes of the various instances in the following way. 
During the PT simulations, 12000 spin configurations 
of each instance were stored on disk, corresponding to 
100 evenly spaced check-points in Monte Carlo time 
where every check-point contains 4 x 30 = 120 confi^ 
urations. Of those, ground-state (GS) configuration^ 
and minimally-excited (ES) states were picked out for 
further analysis. Our analysis consisted of examining 
the probability distributions of overlaps between (i) ran¬ 
domly chosen GS configurations and randomly chosen 
ES configurations [GS-ES, solid blue curves in Fig. of 
Extended Data (ED)] and (ii) pairs of randomly chosen 
GS configurations (GS-GS, dashed red curves in Fig. [^. 
Two extremal cases were encountered. For easy instances 


30 = 120 system copies. For each of the copies, the fraction 
of Monte Carlo time spent at the higher-temperatures regions, 
namely Ti with i = 16,17,..., was calculated where the 
figure of merit was chosen to be the smallest of these fractions, 
which was used as an indication for how trapped the instance is 
in the low-temperatures region. 

^ For a J = ±1 spin-glass, the ground state is typically highly 
degenerate even in two dimensions ED. Empirically, we find that 
this high-degeneracy is also present on the 2D-like Chimera m- 
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FIG. 5. Probability distribution of the spin overlap |q| 
between ground states and minimally excited states 
(GS-ES, solid blue curves). For comparison, we also show 
the overlap between different ground states (GS-GS, dashed 
red curves). The distributions shown were computed for the 
same instances considered in Fig. 2 of main text, with mix¬ 
ing time generations r « 10'^ (top) and r « 10® (bottom). 
Inset: Dependence of the overlap between ground 
states and minimally excited states on r. Typical me¬ 
dian GS-ES overlap |g| averaged over each hardness group as 
a function of r-generation (blue points, the lines are to guide 
the eye).The red points are GS-GS typical overlaps shown for 
comparison. 

(Fig. [^top), the probability density for GS-GS over¬ 
laps is very similar to the probability density for GS-ES 
overlaps. This is expected in cases where ES states are 
trivially connected to GS states via one spin-flip. Gon- 
versely, for hard instances (Fig. [^bottom), the probabil¬ 
ity densities for GS-GS overlaps and for GS-ES overlaps 
differ significantly, indicating that the vast majority of 
the ES states encountered during the simulations are not 
trivially connected to typical GS configurations but are 
rather very distant from them. 

The top and bottom panels of Fig. describe only 
two representative instances, however the above depic¬ 
tion was also found to be valid in the general case, as 
is confirmed by calculation of the typical jqj for all the 
instances in each of the hardness groups]^ As the inset of 
Fig. [5] shows, the harder instances are, the further away 
minimally excited states are from ground states. 


In order to define a typical |g|, we follow a two-steps procedure. 
First, we obtain a typical \q\ for each instance by computing 
the median overlap [e.g. for the easy instance in the top panel 
of Fig. med(|g|) = 0.918, while for the hard instance in the 
bottom panel meddgl) = 0.25]. Second, we average over all the 
instances in a r-generation, by computing the median over the 
instance medians. 
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FIG. 6. The 476-qubit DW2 device architecture and 
qubit connectivity. 


Appendix C: The D-Wave Two Chip 

1. The Chimera 

The Chimera graph of the D-Wave Two (DW2) chip 
used in this study is shown in Fig. The chip is an 
8 x 8 array of unit cells where each unit cell is a balanced 
A 4 4 bipartite graph of superconducting flux qubits. In 
the ideal Chimera graph the degree of each (internal) 
vertex is 6 . On our chip, only a subset of 476 qubits, is 
functional. The temperature of the device ~ 15mK. 

The chip is designed to solve a very specific type 
of problems, namely, Ising-type optimization problems 
where the cost function is that of the Ising Hamiltonian 
[see Eq. (1) of the main text]. The Ising spins, Si = ±1 
are the variables to be optimized over and the sets {Jy } 
and {hi\ are the programmable parameters of the cost 
function. In addition, {ij) denotes a sum over all active 
edges of the Chimera graph. 


2. The D-Wave Two annealing schednle 

The DW2 performs the annealing by implementing the 
time-dependent Hamiltonian 

H{t) = -A{t) ^ af + , (Cl) 

i 

with t € [ 0 , tg] where the allowed range of annealing times 
ta, due to engineering restrictions, is between 20 /rs and 
20ms. The annealing schedules A{t) and B{t) used in 
the device are shown in Fig. There are four anneal¬ 
ing lines, and their synchronization becomes harder for 
faster annealers. The hltering of the input control lines 
introduces some additional distortion in the annealing 
control. 



FIG. 7. Annealing schedule of the D-Wave chip. The 

functions A{t) and B{t) are the amplitudes of the (transverse- 
field) driver and classical Ising Hamiltonians, respectively. 
Also shown is the temperature in units of energy (fcs = 1). 


3. Gauge averaging on the D-Wave device 

Calibration inaccuracies stemming mainly from the 
digital to analog conversions of the problem parameters, 
cause the couplings and hi realized on the DW2 chip 
to be slightly different from the intended programmed 
values (with a typical ^ 5% variation). Therefore, in¬ 
stances encoded on the device will be generally different 
from the intended instances. Additionally, other, more 
systematic biases exist which cause spins to prefer one 
orientation over another regardless of the encoded pa¬ 
rameters. To neutralize these effects, it is advantageous 
to perform multiple annealing rounds (or ‘programming 
cycles’) on the device, where each such cycle corresponds 
to a different encoding or ‘gauge’ of the same problem 
instance onto the couplers of the device [S]. To real¬ 
ize these different encodings, we use a gauge freedom 
in realizing the Ising spin glass: for each qubit we can 
freely define which of the two qnbits states corresponds 
to Si = -|-1 and Si = — 1 . More formally this corresponds 
to a gauge transformation that changes spins Si —>■ r]iSi, 
with ? 7 i = ±1 and the couplings as Jij —)■ rjirjjJij and 
hi —>■ ? 7 ihi. The simulations are invariant under such a 
gauge transformation, but due to calibration errors which 
break the gauge symmetry, the results returned by the 
DW2 are not. 


4. Performance of the DW2 chip as a function of 
annealing time 

Since the DW2 chip is a putative quantum annealer, it 
is only natural to ask how its performance, namely, the 
typical time-to-solution 4 it yields, depends on annealing 
time tg. Ideally, the longer tg is, the better the perfor¬ 
mance we expect [SI]. However, in practice, decohering 
interactions with the environment are present which be- 
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come more pronounced with longer running times of the 
annealing process. It is therefore plausible to assume that 
there is an optimal ta for which tg is shortest [3]. 

Analysis of the dependence of success probabilities, or 
equivalently times to solution, on annealing time is found 
to be heavily blurred, or masked, by fluctuations in the 
success probability between different programming cy¬ 
cles. These fluctuations, which become more pronounced 
for harder instances, stem from the noisy encoding of 
the instance parameters already discussed above. Any 
meaningful analysis of the dependence of success proba¬ 
bility on annealing time must therefore successfully av¬ 
erage out these effects therefore requires many rounds 
of anneals. Unfortunately (see data acquisition in Ap¬ 
pendix 0 the number of annealing attempts X for a 
given DW2 programming cycle is proportional to 1/ta, 
ranging from ^ 49500 ^ ^t,,= 20 ms = 49 , As 

a consequence, the minimal success probability that can 
be measured from a fa = 20 tos programming cycle is just 
— 1/49 « 0.02. While this limitation is not seri¬ 
ous for easy instances (i.e., those of the r = 10 ^ group), 
for hard instances a typical success probability p is much 
smaller than 0.02. This problem can be alleviated by run¬ 
ning the DW2 chip on the same instance (and ta) multiple 
number of times (with a different gauge for each cycle, 
but with fa held fixed), and then averaging the result¬ 
ing p over programming cycles. In this way, the minimal 
success probability that can be measured is l/(ArAfcycies)- 
For Ncycies in the hundreds typically, typical numbers for 
the minimal measurable success probability were 

« 6.5 X 10-® , « 3.3 X 10-® , 

pi-- « 4 X 10-5 and « 5.1 x lO"" . 


However, we have found that the above minimal- 
probability thresholds are still too high for our hard¬ 
est problems, r > 10®. To overcome this problem, we 
groups the success probabilities into annealing-time win¬ 
dows: 20 ps < fa/ 10 ^ < 60ps and 60ps < ta/ 10 ^ < 200 ps 
for k = 0 , 1,2 (where in the largest-time interval, k = 2 
above, we also included the = 20 ms data). 

At this point, the success probability needs to be aver¬ 
aged over the problem instances of a given r-generation. 
Given the extreme problem-to-problem fluctuations, per¬ 
centiles have been calculated. In Fig. [^top we show the 
50th percentile (i.e., the median) as a function of prob¬ 
ability. Unfortunately, in spite of our efforts to increase 
the experimental sensitivity, the measured success prob¬ 
ability yielded zero for more than a half of the instances 
with T = 10® and r = 10®. Hence, we show Fig. [^bottom 
the results for the 80th percentile. 

In all cases, we found that a power law description 

p(r,4)-^i^\ (C2) 


is adequate, although the exponent 0 depends signifi¬ 
cantly both on r and on the percentile considered (see 
Fig[^. The trends are very clear. For easy instances, p 
barely depends on ta (yielding 0 « 0). In fact, the ex¬ 
ponent 9 increases with increasing r, meaning that the 



q (/^s) 


FIG. 8. Dependence of success probability on anneal¬ 
ing time. Typical success probability (see main text for de¬ 
tails) as a function of annealing time for the various hardness 
groups. Shown are the 50th and 80th percentiles within each 
hardness group (top and bottom panels, respectively). 
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FIG. 9. Dependence of success p rob ability on anneal¬ 
ing time. The exponent 9 of Eq. (C2|, as computed from 
the data shown in Fig. plotted against r-generation. 


harder the instance is, the more it typically benefits from 
increasing ta- 

Recalling that time to solution is given by tg = ta,/p 
, we find that the exponent 6 in Eq. (C2) is less 
than 1 for all hardness groups, i.e., that the shorter the 
annealing time is, the shorter the time-to-solution be¬ 
comes. Since however this trend can not hold all the way 
down to ta = 0 , these results therefore imply that there 
exist for each group an optimal annealing time that is 
however below the shortest-accessible ta = 20/iS. Fur¬ 
thermore, the increase of 9 with hardness group can be 
interpreted as the harder the instances are, the longer the 
typical optimal annealing time is, consistently with what 
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one would expect from a quantum annealer. It is im¬ 
portant to note at this point that the highly-fluctuating 
success probability, stemming from programming errors, 


unfortunately does note allow for a more sensitive analy¬ 
sis and that more robust results could be obtained if the 
programing errors leading to J-chaos were to be reduced. 
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